Mapping Transcripts to Handwritten Text
نویسندگان
چکیده
In the analysis and recognition of handwriting, a useful first task is to assign ground truth for words in the writing. Such an assignment is useful for various subsequent machine learning tasks for performing automatic recognition, writer verification, etc. Since automatic word segmentation and recognition can be error prone, an intermediate approach is to use a text file that is a transcription of the handwriting image for performing ground truth assignment. This paper describes an algorithm for finding the best word level alignment between the transcript and the handwriting image. The algorithm is useful in tasks such as: (i) extracting words and characters as characteristic elements in writer verification and identification tasks; (ii) creating a large ground-truthed dataset for handwriting document analysis (in word and even character levels); (iii) indexing a collection of handwritten materials for document retrieval, such as for historical manuscripts. The algorithm achieves an 84.7% accuracy in aligning words on whole images when evaluated on 20 pages from a handwriting database created for forensic document examination studies.
منابع مشابه
Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملThe Comparison of Typed and Handwritten Essays of Iranian EFL Students in terms of Length, Spelling, and Grammar
This study attempted to compare typed and handwritten essays of Iranian EFL students in terms of length, spelling, and grammar. To administer the study, the researchers utilized Alice Touch Typing Tutor software to select 15 upper intermediate students with higher ability to write two essays: one typed and the other handwritten. The students were both males and females between the ages of 22 to...
متن کاملUnderstanding Handwritten Text in a Structured Environment: Determining ZIP Codes from Addresses
Understanding a block of handwritten text means mapping it into a semantic representation, We describe an approach to reading a I>lock of handwritten text when there arc certain loose constraints placed on the spatial layout and syntax 01 the tnt. Early recognition of primitives guides the location of syntactic components. A system to read handwritten postal addresses is described as an instanc...
متن کاملTranscript mapping for handwritten Chinese documents by integrating character recognition model and geometric context
Creating document image datasets with ground-truths of regions, text lines and characters is a prerequisite for document analysis research. However, ground-truthing large datasets is not only laborious and time consuming but also prone to errors due to the difficulty of character segmentation and the large variability of character shape, size and position. This paper describes an effective reco...
متن کاملConnected Component Based Word Spotting on Persian Handwritten image documents
Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...
متن کامل